OcrV1, Main, Exploration, bibRecord, 002B11

Document image analysis: What is missing?

Identifieur interne : 002B11 ( Main/Exploration ); précédent : 002B10; suivant : 002B12

Document image analysis: What is missing?

Auteurs : George Nagy (informaticien) [États-Unis]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 1995.

RBID : ISTEX:98ED88BBEA3D672C8B6B4CD53334028B97E12076

Abstract

Abstract: The conversion of documents into electronic form has proved more difficult than anticipated. Document image analysis still accounts for only a small fraction of the rapidly-expanding document imaging market. Nevertheless, the optimism manifested over the last thirty years has not dissipated. Driven partly by document distribution on CD-ROM and via the World Wide Web, there is more interest in the preservation of layout and format attributes to increase legibility (sometimes called “page reconstruction”) rather than just text/non-text separation. The realization that accurate document image analysis requires fairly specific pre-stored information has resulted in the investigation of new data structures for knowledge bases and for the representation of the results of partial analysis. At the same time, the requirements of downstream software, such as word processing, information retrieval and computer-aided design applications, favor turning the results of the analysis and recognition into some standard format like SGML or DXF. There is increased emphasis on large-scale, automated comparative evaluation, using laboriously compiled test databases. The cost of generating these databases has stimulated new research on synthetic noise models. According to recent publications, the accurate conversion of business letters, technical reports, large typeset repositories like patents, postal addresses, specialized line drawings, and office forms containing a mix of handprinted, handwritten and printed material, is finally on the verge of success.

Url:

https://api.istex.fr/document/98ED88BBEA3D672C8B6B4CD53334028B97E12076/fulltext/pdf

DOI: 10.1007/3-540-60298-4_317

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000018
to stream Istex, to step Curation: 000018
to stream Istex, to step Checkpoint: 001E68
to stream Main, to step Merge: 002C68
to stream Main, to step Curation: 002B11

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Document image analysis: What is missing?</title>
<author><name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
<affiliation><country>États-Unis</country>
<placeName><settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:98ED88BBEA3D672C8B6B4CD53334028B97E12076</idno>
<date when="1995" year="1995">1995</date>
<idno type="doi">10.1007/3-540-60298-4_317</idno>
<idno type="url">https://api.istex.fr/document/98ED88BBEA3D672C8B6B4CD53334028B97E12076/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000018</idno>
<idno type="wicri:Area/Istex/Curation">000018</idno>
<idno type="wicri:Area/Istex/Checkpoint">001E68</idno>
<idno type="wicri:doubleKey">0302-9743:1995:Nagy G:document:image:analysis</idno>
<idno type="wicri:Area/Main/Merge">002C68</idno>
<idno type="wicri:Area/Main/Curation">002B11</idno>
<idno type="wicri:Area/Main/Exploration">002B11</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Document image analysis: What is missing?</title>
<author><name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>ECSE, RPI, 12180-3590, Troy, NY</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
<placeName><settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
<placeName><settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>1995</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">98ED88BBEA3D672C8B6B4CD53334028B97E12076</idno>
<idno type="DOI">10.1007/3-540-60298-4_317</idno>
<idno type="ChapterID">89</idno>
<idno type="ChapterID">Chap89</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: The conversion of documents into electronic form has proved more difficult than anticipated. Document image analysis still accounts for only a small fraction of the rapidly-expanding document imaging market. Nevertheless, the optimism manifested over the last thirty years has not dissipated. Driven partly by document distribution on CD-ROM and via the World Wide Web, there is more interest in the preservation of layout and format attributes to increase legibility (sometimes called “page reconstruction”) rather than just text/non-text separation. The realization that accurate document image analysis requires fairly specific pre-stored information has resulted in the investigation of new data structures for knowledge bases and for the representation of the results of partial analysis. At the same time, the requirements of downstream software, such as word processing, information retrieval and computer-aided design applications, favor turning the results of the analysis and recognition into some standard format like SGML or DXF. There is increased emphasis on large-scale, automated comparative evaluation, using laboriously compiled test databases. The cost of generating these databases has stimulated new research on synthetic noise models. According to recent publications, the accurate conversion of business letters, technical reports, large typeset repositories like patents, postal addresses, specialized line drawings, and office forms containing a mix of handprinted, handwritten and printed material, is finally on the verge of success.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>État de New York</li>
</region>
<settlement><li>Troy (New York</li>
</settlement>
<orgName><li>Institut polytechnique Rensselaer</li>
</orgName>
</list>
<tree><country name="États-Unis"><region name="État de New York"><name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
</region>
<name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002B11 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002B11 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:98ED88BBEA3D672C8B6B4CD53334028B97E12076
   |texte=   Document image analysis: What is missing?
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Document image analysis: What is missing?

Document image analysis: What is missing?

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri